Selective Rademacher Penalization and Reduced Error Pruning of Decision Trees

نویسندگان

  • Matti Kääriäinen
  • Tuomo Malinen
  • Tapio Elomaa
چکیده

Rademacher penalization is a modern technique for obtaining data-dependent bounds on the generalization error of classifiers. It appears to be limited to relatively simple hypothesis classes because of computational complexity issues. In this paper we, nevertheless, apply Rademacher penalization to the in practice important hypothesis class of unrestricted decision trees by considering the prunings of a given decision tree rather than the tree growing phase. This study constitutes the first application of Rademacher penalization to hypothesis classes that have practical significance. We present two variations of the approach, one in which the hypothesis class consists of all prunings of the initial tree and another in which only the prunings that are accurate on growing data are taken into account. Moreover, we generalize the error-bounding approach from binary classification to multi-class situations. Our empirical experiments indicate that the proposed new bounds outperform distribution-independent bounds for decision tree prunings and provide non-trivial error estimates on real-world data sets.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Learning Small Trees and Graphs that Generalize

In this Thesis we study issues related to learning small tree and graph formed classifiers. First, we study reduced error pruning of decision trees and branching programs. We analyze the behavior of a reduced error pruning algorithm for decision trees under various probabilistic assumptions on the pruning data. As a result we get, e.g., new upper bounds for the probability of replacing a tree t...

متن کامل

Rademacher Penalization over Decision Tree Prunings

De ision Tree Prunings Matti K aari ainen and Tapio Elomaa Department of Computer S ien e, University of Helsinki, Finland fmatti.kaariainen,elomaag s.helsinki.fi Abstra t. Radema her penalization is a modern te hnique for obtaining data-dependent bounds on the generalization error of lassi ers. It would appear to be limited to relatively simple hypothesis lasses beause of omputational omple...

متن کامل

Model selection using Rademacher Penalization

In this paper we describe the use of Rademacher penalization for model selection. As in Vapnik's Guaranteed Risk Minimization (GRM), Rademacher penalization attemps to balance the complexity of the model with its t to the data by minimizing the sum of the training error and a penalty term, which is an upper bound on the absolute di erence between the training error and the generalization error....

متن کامل

Reduced - error pruning with signi cance testsEIBE

When building classiication models, it is common practice to prune them to counter spurious eeects of the training data: this often improves performance and reduces model size. \Reduced-error pruning" is a fast pruning procedure for decision trees that is known to produce small and accurate trees. Apart from the data from which the tree is grown, it uses an independent \pruning" set, and prunin...

متن کامل

The Di culty of Reduced Error Pruning ofLeveled Branching

Induction of decision trees is one of the most successful approaches to supervised machine learning. Branching programs are a generalization of decision trees and, by the boosting analysis, exponentially more eeciently learnable than decision trees. In experiments this advantage has not been seen to materialize. Decision trees are easy to simplify using pruning. For branching programs no such a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Journal of Machine Learning Research

دوره 5  شماره 

صفحات  -

تاریخ انتشار 2004